Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38051617

RESUMO

Computational drug repositioning can identify potential associations between drugs and diseases. This technology has been shown to be effective in accelerating drug development and reducing experimental costs. Although there has been plenty of research for this task, existing methods are deficient in utilizing complex relationships among biological entities, which may not be conducive to subsequent simulation of drug treatment processes. In this article, we propose a heterogeneous graph embedding method called HMLKGAT to infer novel potential drugs for diseases. More specifically, we first construct a heterogeneous information network by combining drug-disease, drug-protein and disease-protein biological networks. Then, a multi-layer graph attention model is utilized to capture the complex associations in the network to derive representations for drugs and diseases. Finally, to maintain the relationship of nodes in different feature spaces, we propose a multi-kernel learning method to transform and combine the representations. Experimental results demonstrate that HMLKGAT outperforms six state-of-the-art methods in drug-related disease prediction, and case studies of five classical drugs further demonstrate the effectiveness of HMLKGAT.


Assuntos
Aprendizado Profundo , Simulação por Computador , Desenvolvimento de Medicamentos , Reposicionamento de Medicamentos
2.
Methods ; 218: 48-56, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37516260

RESUMO

Drug repurposing, which typically applies the procedure of drug-disease associations (DDAs) prediction, is a feasible solution to drug discovery. Compared with traditional methods, drug repurposing can reduce the cost and time for drug development and advance the success rate of drug discovery. Although many methods for drug repurposing have been proposed and the obtained results are relatively acceptable, there is still some room for improving the predictive performance, since those methods fail to consider fully the issue of sparseness in known drug-disease associations. In this paper, we propose a novel multi-task learning framework based on graph representation learning to identify DDAs for drug repurposing. In our proposed framework, a heterogeneous information network is first constructed by combining multiple biological datasets. Then, a module consisting of multiple layers of graph convolutional networks is utilized to learn low-dimensional representations of nodes in the constructed heterogeneous information network. Finally, two types of auxiliary tasks are designed to help to train the target task of DDAs prediction in the multi-task learning framework. Comprehensive experiments are conducted on real data and the results demonstrate the effectiveness of the proposed method for drug repurposing.


Assuntos
Desenvolvimento de Medicamentos , Reposicionamento de Medicamentos , Descoberta de Drogas
3.
IEEE J Biomed Health Inform ; 27(6): 3061-3071, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37030796

RESUMO

In the treatment of bacterial infectious diseases, overuse of antibiotics may lead to not only bacterial resistance to antibiotics but also dysbiosis of beneficial bacteria which are essential for maintaining normal human life activities. Instead, phage therapy, which invades and lyses specific pathogenic bacteria without affecting beneficial bacteria, becomes more and more popular to treat bacterial infectious diseases. For the effective phage therapy, it requires to accurately predict potential phage-host interactions from heterogeneous information network consisting of bacteria and phages. Although many models have been proposed for predicting phage-host interactions, most methods fail to consider fully the sparsity and unconnectedness of phage-host heterogeneous information network, deriving the undesirable performance on phage-host interactions prediction. To address the challenge, we propose an effective model called GERMAN-PHI for predicting Phage-Host Interactions via Graph Embedding Representation learning with Multi-head Attention mechaNism. In GERMAN-PHI, the multi-head attention mechanism is utilized to learn representations of phages and hosts from multiple perspectives of phage-host associations, addressing the sparsity and unconnectedness in phage-host heterogeneous information network. More specifically, a module of GAT with talking-heads is employed to learn representations of phages and bacteria, on which neural induction matrix completion is conducted to reconstruct the phage-host association matrix. Results of comprehensive experiments demonstrate that GERMAN-PHI performs better than the state-of-the-art methods on phage-host interactions prediction. In addition, results of case study for two high-risk human pathogens show that GERMAN-PHI can predict validated phages with high accuracy, and some potential or new associated phages are provided as well.


Assuntos
Bacteriófagos , Doenças Transmissíveis , Humanos , Bactérias , Antibacterianos
4.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36750041

RESUMO

Drug-drug interactions (DDIs) are compound effects when patients take two or more drugs at the same time, which may weaken the efficacy of drugs or cause unexpected side effects. Thus, accurately predicting DDIs is of great significance for the drug development and the drug safety surveillance. Although many methods have been proposed for the task, the biological knowledge related to DDIs is not fully utilized and the complex semantics among drug-related biological entities are not effectively captured in existing methods, leading to suboptimal performance. Moreover, the lack of interpretability for the predicted results also limits the wide application of existing methods for DDIs prediction. In this study, we propose a novel framework for predicting DDIs with interpretability. Specifically, we construct a heterogeneous information network (HIN) by explicitly utilizing the biological knowledge related to the procedure of inducing DDIs. To capture the complex semantics in HIN, a meta-path-based information fusion mechanism is proposed to learn high-quality representations of drugs. In addition, an attention mechanism is designed to combine semantic information obtained from meta-paths with different lengths to obtain final representations of drugs for DDIs prediction. Comprehensive experiments are conducted on 2410 approved drugs, and the results of predictive performance comparison show that our proposed framework outperforms selected representative baselines on the task of DDIs prediction. The results of ablation study and cold-start scenario indicate that the meta-path-based information fusion mechanism red is beneficial for capturing the complex semantics among drug-related biological entities. Moreover, the results of case study demonstrate that the designed attention mechanism is able to provide partial interpretability for the predicted DDIs. Therefore, the proposed method will be a feasible solution to the task of predicting DDIs.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Interações Medicamentosas , Semântica
5.
Evol Bioinform Online ; 16: 1176934320970572, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33328721

RESUMO

Microbial community is ubiquitous in nature, which has a great impact on the living environment and human health. All these effects of microbial communities on the environment and their hosts are often referred to as the functions of these communities, which depend largely on the composition of the communities. The study of microbial higher-order module can help us understand the dynamic development and evolution process of microbial community and explore community function. Considering that traditional clustering methods depend on the number of clusters or the influence of data that does not belong to any cluster, this paper proposes a hypergraph clustering algorithm based on game theory to mine the microbial high-order interaction module (HCGI), and the hypergraph clustering problem naturally turns into a clustering game problem, the partition of network modules is transformed into finding the critical point of evolutionary stability strategy (ESS). The experimental results show HCGI does not depend on the number of classes, and can get more conservative and better quality microbial clustering module, which provides reference for researchers and saves time and cost. The source code of HCGI in this paper can be downloaded from https://github.com/ylm0505/HCGI.

6.
Front Genet ; 10: 1316, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31998371

RESUMO

miRNA plays an important role in many biological processes, and increasing evidence shows that miRNAs are closely related to human diseases. Most existing miRNA-disease association prediction methods were only based on data related to miRNAs and diseases and failed to effectively use other existing biological data. However, experimentally verified miRNA-disease associations are limited, there are complex correlations between biological data. Therefore, we propose a novel Three-layer heterogeneous network Combined with unbalanced Random Walk for MiRNA-Disease Association prediction algorithm (TCRWMDA), which can effectively integrate multi-source association data. TCRWMDA based not only on the known miRNA-disease associations, also add the new priori information (lncRNA-miRNA and lncRNA-disease associations) to build a three-layer heterogeneous network, lncRNA was added as the transition path of the intermediate point to mine more effective information between networks. The AUC value obtained by the TCRWMDA algorithm on 5-fold cross validation is 0.9209, compared with other models based on the same similarity calculation method, TCRWMDA obtained better results. TCRWMDA was applied to the analysis of four types of cancer, the results proved that TCRWMDA is an effective tool to predict the potential miRNA-disease association. The source code and dataset of TCRWMDA are available at: https://github.com/ylm0505/TCRWMDA.

7.
BMC Bioinformatics ; 19(Suppl 20): 505, 2018 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-30577738

RESUMO

BACKGROUND: The traditional methods of visualizing high-dimensional data objects in low-dimensional metric spaces are subject to the basic limitations of metric space. These limitations result in multidimensional scaling that fails to faithfully represent non-metric similarity data. RESULTS: Multiple maps t-SNE (mm-tSNE) has drawn much attention due to the construction of multiple mappings in low-dimensional space to visualize the non-metric pairwise similarity to eliminate the limitations of a single metric map. mm-tSNE regularization combines the intrinsic geometry between data points in a high-dimensional space. The weight of data points on each map is used as the regularization parameter of the manifold, so the weights of similar data points on the same map are also as close as possible. However, these methods use standard momentum methods to calculate parameters of gradient at each iteration, which may lead to erroneous gradient search directions so that the target loss function fails to achieve a better local minimum. In this article, we use a Nesterov momentum method to learn the target loss function and correct each gradient update by looking back at the previous gradient in the candidate search direction. By using indirect second-order information, the algorithm obtains faster convergence than the original algorithm. To further evaluate our approach from a comparative perspective, we conducted experiments on several datasets including social network data, phenotype similarity data, and microbiomic data. CONCLUSIONS: The experimental results show that the proposed method achieves better results than several versions of mm-tSNE based on three evaluation indicators including the neighborhood preservation ratio (NPR), error rate and time complexity.


Assuntos
Regulação da Expressão Gênica , Doenças Genéticas Inatas/genética , Microbiota/genética , Dinâmica não Linear , Algoritmos , Bases de Dados Genéticas , Humanos , Fenótipo , Fatores de Tempo
8.
Indian J Cancer ; 55(4): 340-343, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30829267

RESUMO

BACKGROUND: Human leukocyte antigen-G (HLA-G) is a tumor-associated molecule, whose expression may help the cancer cells to escape the immune response. AIMS: The aim of this study was to evaluate the diagnostic value of HLA-G level in oral squamous cell carcinoma (OSCC). MATERIALS AND METHODS: A total of 52 patients who had definite pathological diagnosis and 20 cases of healthy controls were enrolled in this clinical trial. Immunohistochemisty (IHC) and quantitative real-time reverse transcription-polymerase chain reaction (RT-PCR) analysis were considered for HLA-G identification and multilevel validations. Statistical analysis was performed using SPSS and statistical significance was determined at P < 0.05. RESULTS: IHC results demonstrated that the expression of HLA-G in OSCC was strongly positive and the rate of positive expression was 55.77% (29/52), but the expression of HLA-G in healthy controls was negative (0/20). Furthermore, RT-PCR results showed that the positive expression rate of HLA-G messenger RNA was weak in healthy controls, but strong in OSCC. Besides, HLA-G expression in the tumors was significantly correlated with histological grade. CONCLUSIONS: Our results suggested that HLA-G is associated with the prognosis of OSCC and may serve as a novel therapeutic target.


Assuntos
Biomarcadores Tumorais/metabolismo , Carcinoma de Células Escamosas/diagnóstico , Antígenos HLA-G/metabolismo , Neoplasias Bucais/metabolismo , RNA Mensageiro/genética , Carcinoma de Células Escamosas/patologia , Feminino , Antígenos HLA-G/genética , Humanos , Imuno-Histoquímica , Masculino , Neoplasias Bucais/patologia , Estadiamento de Neoplasias , Valor Preditivo dos Testes , Prognóstico
9.
PLoS One ; 12(10): e0186134, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29045465

RESUMO

How to identify protein complex is an important and challenging task in proteomics. It would make great contribution to our knowledge of molecular mechanism in cell life activities. However, the inherent organization and dynamic characteristic of cell system have rarely been incorporated into the existing algorithms for detecting protein complexes because of the limitation of protein-protein interaction (PPI) data produced by high throughput techniques. The availability of time course gene expression profile enables us to uncover the dynamics of molecular networks and improve the detection of protein complexes. In order to achieve this goal, this paper proposes a novel algorithm DCA (Dynamic Core-Attachment). It detects protein-complex core comprising of continually expressed and highly connected proteins in dynamic PPI network, and then the protein complex is formed by including the attachments with high adhesion into the core. The integration of core-attachment feature into the dynamic PPI network is responsible for the superiority of our algorithm. DCA has been applied on two different yeast dynamic PPI networks and the experimental results show that it performs significantly better than the state-of-the-art techniques in terms of prediction accuracy, hF-measure and statistical significance in biology. In addition, the identified complexes with strong biological significance provide potential candidate complexes for biologists to validate.


Assuntos
Algoritmos , Complexos Multiproteicos/metabolismo , Mapas de Interação de Proteínas , Redes Reguladoras de Genes , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
10.
Methods ; 124: 120-125, 2017 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-28625914

RESUMO

As we all know, the microbiota show remarkable variability within individuals. At the same time, those microorganisms living in the human body play a very important role in our health and disease, so the identification of the relationships between microbes and diseases will contribute to better understanding of microbes interactions, mechanism of functions. However, the microbial data which are obtained through the related technical sequencing is too much, but the known associations between the diseases and microbes are very less. In bioinformatics, many researchers choose the network topology analysis to solve these problems. Inspired by this idea, we proposed a new method for prioritization of candidate microbes to predict potential disease-microbe association. First of all, we connected the disease network and microbe network based on the known disease-microbe relationships information to construct a heterogeneous network, then we extended the random walk to the heterogeneous network, and used leave-one-out cross-validation and ROC curve to evaluate the method. In conclusion, the algorithm could be effective to disclose some potential associations between diseases and microbes that cannot be found by microbe network or disease network only. Furthermore, we studied three representative diseases, Type 2 diabetes, Asthma and Psoriasis, and finally presented the potential microbes associated with these diseases by ranking candidate disease-causing microbes, respectively. We confirmed that the discovery of the new associations will be a good clinical solution for disease mechanism understanding, diagnosis and therapy.


Assuntos
Algoritmos , Asma/genética , Diabetes Mellitus Tipo 2/genética , Interações Hospedeiro-Patógeno , Microbiota/genética , Psoríase/genética , Asma/microbiologia , Asma/patologia , Diabetes Mellitus Tipo 2/microbiologia , Diabetes Mellitus Tipo 2/patologia , Redes Reguladoras de Genes , Humanos , Mapeamento de Interação de Proteínas , Psoríase/microbiologia , Psoríase/patologia , Curva ROC , Biologia de Sistemas/métodos , Biologia de Sistemas/estatística & dados numéricos
11.
Methods ; 110: 44-53, 2016 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-27405005

RESUMO

Protein complexes comprising of interacting proteins in protein-protein interaction network (PPI network) play a central role in driving biological processes within cells. Recently, more and more swarm intelligence based algorithms to detect protein complexes have been emerging, which have become the research hotspot in proteomics field. In this paper, we propose a novel algorithm for identifying protein complexes based on brainstorming strategy (IPC-BSS), which is integrated into the main idea of swarm intelligence optimization and the improved K-means algorithm. Distance between the nodes in PPI network is defined by combining the network topology and gene ontology (GO) information. Inspired by human brainstorming process, IPC-BSS algorithm firstly selects the clustering center nodes, and then they are separately consolidated with the other nodes with short distance to form initial clusters. Finally, we put forward two ways of updating the initial clusters to search optimal results. Experimental results show that our IPC-BSS algorithm outperforms the other classic algorithms on yeast and human PPI networks, and it obtains many predicted protein complexes with biological significance.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Proteômica/métodos , Algoritmos , Análise por Conglomerados , Humanos , Saccharomyces cerevisiae/genética
12.
Methods ; 110: 90-96, 2016 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-27320204

RESUMO

Detection of temporal protein complexes would be a great aid in furthering our knowledge of the dynamic features and molecular mechanism in cell life activities. Most existing clustering algorithms for discovering protein complexes are based on static protein interaction networks in which the inherent dynamics are often overlooked. We propose a novel algorithm DPC-NADPIN (Discovering Protein Complexes based on Neighbor Affinity and Dynamic Protein Interaction Network) to identify temporal protein complexes from the time course protein interaction networks. Inspired by the idea of that the tighter a protein's neighbors inside a module connect, the greater the possibility that the protein belongs to the module, DPC-NADPIN algorithm first chooses each of the proteins with high clustering coefficient and its neighbors to consolidate into an initial cluster, and then the initial cluster becomes a protein complex by appending its neighbor proteins according to the relationship between the affinity among neighbors inside the cluster and that outside the cluster. In our experiments, DPC-NADPIN algorithm is proved to be reasonable and it has better performance on discovering protein complexes than the following state-of-the-art algorithms: Hunter, MCODE, CFinder, SPICI, and ClusterONE; Meanwhile, it obtains many protein complexes with strong biological significance, which provide helpful biological knowledge to the related researchers. Moreover, we find that proteins are assembled coordinately to form protein complexes with characteristics of temporality and spatiality, thereby performing specific biological functions.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Algoritmos , Análise por Conglomerados , Complexos Multiproteicos/genética
13.
PLoS One ; 11(4): e0153967, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27100396

RESUMO

The identification of temporal protein complexes would make great contribution to our knowledge of the dynamic organization characteristics in protein interaction networks (PINs). Recent studies have focused on integrating gene expression data into static PIN to construct dynamic PIN which reveals the dynamic evolutionary procedure of protein interactions, but they fail in practice for recognizing the active time points of proteins with low or high expression levels. We construct a Time-Evolving PIN (TEPIN) with a novel method called Deviation Degree, which is designed to identify the active time points of proteins based on the deviation degree of their own expression values. Owing to the differences between protein interactions, moreover, we weight TEPIN with connected affinity and gene co-expression to quantify the degree of these interactions. To validate the efficiencies of our methods, ClusterONE, CAMSE and MCL algorithms are applied on the TEPIN, DPIN (a dynamic PIN constructed with state-of-the-art three-sigma method) and SPIN (the original static PIN) to detect temporal protein complexes. Each algorithm on our TEPIN outperforms that on other networks in terms of match degree, sensitivity, specificity, F-measure and function enrichment etc. In conclusion, our Deviation Degree method successfully eliminates the disadvantages which exist in the previous state-of-the-art dynamic PIN construction methods. Moreover, the biological nature of protein interactions can be well described in our weighted network. Weighted TEPIN is a useful approach for detecting temporal protein complexes and revealing the dynamic protein assembly process for cellular organization.


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteínas/metabolismo , Humanos , Proteínas/genética , Fatores de Tempo
14.
Int J Data Min Bioinform ; 13(4): 378-94, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26547985

RESUMO

In this paper, we develop a novel regularisation method for MVAR via weighted fusion which considers the correlation among variables. In theory, we discuss the grouping effect of weighted fusion regularisation for linear models. By virtue of the probability method, we show that coefficients corresponding to highly correlated predictors have small differences. A quantitative estimate for such small differences is given regardless of the coefficients signs. The estimate is also improved when consider empirical approximation error if the model fit the data well. We then apply the proposed model on several time series data sets especially a time series dataset of human gut microbiomes. The experimental results indicate that the new approach has better performance than several other VAR-based models and we also demonstrate its capability of extracting relevant microbial interactions.


Assuntos
Fenômenos Fisiológicos Bacterianos , Comunicação Celular/fisiologia , Intestinos/microbiologia , Microbiota/fisiologia , Modelos Biológicos , Modelos Estatísticos , Simulação por Computador , Humanos , Análise de Regressão
15.
Int J Data Min Bioinform ; 11(4): 458-73, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26336669

RESUMO

In order to overcome the limitations of global modularity and the deficiency of local modularity, we propose a hybrid modularity measure Local-Global Quantification (LGQ) which considers global modularity and local modularity together. LGQ adopts a suitable module feature adjustable parameter to control the balance of global detecting capability and local search capability in Protein-Protein Interactions (PPI) Network. Furthermore, we develop a new protein complex mining algorithm called Best Neighbour and Local-Global Quantification (BN-LGQ) which integrates the best neighbour node and modularity increment. BN-LGQ expands the protein complex by fast searching the best neighbour node of the current cluster and by calculating the modularity increment as a metric to determine whether the best neighbour node can join the current cluster. The experimental results show BN-LGQ performs a better accuracy on predicting protein complexes and has a higher match with the reference protein complexes than MCL and MCODE algorithms. Moreover, BN-LGQ can effectively discover protein complexes with better biological significance in the PPI network.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/classificação , Proteômica/métodos , Algoritmos
16.
BMC Bioinformatics ; 16 Suppl 12: S5, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26330105

RESUMO

BACKGROUND: The identification of protein functional modules would be a great aid in furthering our knowledge of the principles of cellular organization. Most existing algorithms for identifying protein functional modules have a common defect -- once a protein node is assigned to a functional module, there is no chance to move the protein to the other functional modules during the follow-up processes, which lead the erroneous partitioning occurred at previous step to accumulate till to the end. RESULTS: In this paper, we design a new algorithm ADM (Adaptive Density Modularity) to detect protein functional modules based on adaptive density modularity. In ADM algorithm, according to the comparison between external closely associated degree and internal closely associated degree, the partitioning of a protein-protein interaction network into functional modules always evolves quickly to increase the density modularity of the network. The integration of density modularity into the new algorithm not only overcomes the drawback mentioned above, but also contributes to identifying protein functional modules more effectively. CONCLUSIONS: The experimental result reveals that the performance of ADM algorithm is superior to many state-of-the-art protein functional modules detection techniques in aspect of the accuracy of prediction. Moreover, the identified protein functional modules are statistically significant in terms of "Biological Process" annotated in Gene Ontology, which provides substantial support for revealing the principles of cellular organization.


Assuntos
Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Aprendizado de Máquina Supervisionado , Algoritmos , Ontologia Genética
17.
IEEE Trans Nanobioscience ; 14(5): 521-7, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26080386

RESUMO

Disease-causing genes prioritization is very important to understand disease mechanisms and biomedical applications, such as design of drugs. Previous studies have shown that promising candidate genes are mostly ranked according to their relatedness to known disease genes or closely related disease genes. Therefore, a dangling gene (isolated gene) with no edges in the network can not be effectively prioritized. These approaches tend to prioritize those genes that are highly connected in the PPI network while perform poorly when they are applied to loosely connected disease genes. To address these problems, we propose a new disease-causing genes prioritization method that based on network diffusion and rank concordance (NDRC). The method is evaluated by leave-one-out cross validation on 1931 diseases in which at least one gene is known to be involved, and it is able to rank the true causal gene first in 849 of all 2542 cases. The experimental results suggest that NDRC significantly outperforms other existing methods such as RWR, VAVIEN, DADA and PRINCE on identifying loosely connected disease genes and successfully put dangling genes as potential candidate disease genes. Furthermore, we apply NDRC method to study three representative diseases, Meckel syndrome 1, Protein C deficiency and Peroxisome biogenesis disorder 1A (Zellweger). Our study has also found that certain complex disease-causing genes can be divided into several modules that are closely associated with different disease phenotype.


Assuntos
Biologia Computacional/métodos , Doença/genética , Mapas de Interação de Proteínas/genética , Proteínas/genética , Algoritmos , Humanos
18.
BMC Bioinformatics ; 15 Suppl 12: S7, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25474367

RESUMO

BACKGROUND: In recent years, many protein complex mining algorithms, such as classical clique percolation (CPM) method and markov clustering (MCL) algorithm, have developed for protein-protein interaction network. However, most of the available algorithms primarily concentrate on mining dense protein subgraphs as protein complexes, failing to take into account the inherent organizational structure within protein complexes. Thus, there is a critical need to study the possibility of mining protein complexes using the topological information hidden in edges. Moreover, the recent massive experimental analyses reveal that protein complexes have their own intrinsic organization. METHODS: Inspired by the formation process of cliques of the complex social network and the centrality-lethality rule, we propose a new protein complex mining algorithm called Multistage Kernel Extension (MKE) algorithm, integrating the idea of critical proteins recognition in the Protein- Protein Interaction (PPI) network,. MKE first recognizes the nodes with high degree as the first level kernel of protein complex, and then adds the weighted best neighbour node of the first level kernel into the current kernel to form the second level kernel of the protein complex. This process is repeated, extending the current kernel to form protein complex. In the end, overlapped protein complexes are merged to form the final protein complex set. RESULTS: Here MKE has better accuracy compared with the classical clique percolation method and markov clustering algorithm. MKE also performs better than the classical clique percolation method both on Gene Ontology semantic similarity and co-localization enrichment and can effectively identify protein complexes with biological significance in the PPI network.


Assuntos
Algoritmos , Complexos Multiproteicos/metabolismo , Mapeamento de Interação de Proteínas/métodos , Análise por Conglomerados , Mapas de Interação de Proteínas
19.
IEEE Trans Nanobioscience ; 13(2): 80-8, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24803023

RESUMO

Recent studies have shown that protein complex is composed of core proteins and attachment proteins, and proteins inside the core are highly co-expressed. Based on this new concept, we reconstruct weighted PPI network by using gene expression data, and develop a novel protein complex identification algorithm from the angle of edge (PCIA-GeCo). First, we select the edge with high co-expressed coefficient as seed to form the preliminary cores. Then, the preliminary cores are filtered according to the weighted density of complex core to obtain the unique core. Finally, the protein complexes are generated by identifying attachment proteins for each core. A comprehensive comparison in term of F-measure, Coverage rate, P-value between our method and three other existing algorithms HUNTER, COACH and CORE has been made by comparing the predicted complexes against benchmark complexes. The evaluation results show our method PCIA-GeCo is effective; it can identify protein complexes more accurately.


Assuntos
Algoritmos , Proteínas de Ligação a DNA/metabolismo , Mapeamento de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Reparo do DNA , Proteínas de Ligação a DNA/genética , Bases de Dados de Proteínas , Expressão Gênica , Proteínas de Saccharomyces cerevisiae/genética
20.
IEEE Trans Nanobioscience ; 13(2): 89-96, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24803142

RESUMO

A novel algorithm based on Connected Affinity Clique Extension (CACE) for mining overlapping functional modules in protein interaction network is proposed in this paper. In this approach, the value of protein connected affinity which is inferred from protein complexes is interpreted as the reliability and possibility of interaction. The protein interaction network is constructed as a weighted graph, and the weight is dependent on the connected affinity coefficient. The experimental results of our CACE in two test data sets show that the CACE can detect the functional modules much more effectively and accurately when compared with other state-of-art algorithms CPM and IPC-MCE.


Assuntos
Algoritmos , Proteínas Fúngicas/metabolismo , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...